Center Level Statistics

Description

  1. Assess how much the LISTING_CTR_CODE impacts time to offer. The hypothesis is that more remote centers (less nearby donors) have longer wait times and consequentially worse waitlist outcomes. Another reason could be the center specifying tight range of when to make a match run (e.g., tight donor/candidate weight ratio).

  2. Assess a center’s acceptance practice. The hypothesis is that centers that are more selective in donors will have longer wait times until transplants (due to the more frequent refusals) and consequentially worse waitlist outcomes.

Load all required data

Assess how much the LISTING_CTR_CODE impacts time to offer.

This section builds a cox-ph model to estimate the center impact on time to offer. The model controls for several candidate variables that could impact time to offer: Age, Weight, Status, Blood Type, and Year.

The Listing Center ID is treated as a random effect and the estimated coefficients are used to assess how much longer or shorter their candidates have to wait for an offer. There are two main reasons for center effects: the geographic location of the listing center and how narrowly it determines it wants to be notified for an offer (e.g., the min and max weight range of the donor to be placed on match run).

There is one record per candidate related to the first time the candidate was placed on active status (not temporarily inactive).

Candidate population:

  • On a HR waitlist (not HL)
  • Under 18 at time of offer
  • Added to the waitlist between 2010-01-01 and 2020-12-31

Note: we are using time from waitlist to the match run time when donor was offered to candidate. A better value would be the actual offer time, but the initial response data is not very reliable (e.g., there are some delays in entering the initial response time).

Calculate waitlist time at status

Only consider candidates on HR waitlists (not HL or LU)

  1. Get time candidate added to waitlist
  2. Get time ranges candidate is at each status
  3. Get time candidate removed from waitlist
  4. There can be multiple entries for each waitlist corresponding to multiple status changes during the patient’s time on waitlist
Code
WL_STATUS = WL_hist %>% 
  #: only consider HR waitlist (not HL)
  filter(ORG == "HR") %>% 
  #: calculate time at status
  group_by(WL_ID_CODE) %>%     
    arrange(CHG_DT) %>% 
    filter(lag(STATUS) != STATUS | is.na(lag(STATUS)) | CHG_TY == "D") %>% 
    transmute(
      WL_ID_CODE, 
      STATUS, 
      CHG_TY, # type of status change (A = Added, M = Modified, D = removal)
      WL_DT = CHG_DT,# date-time of start in new status (or added)
      # number of days at status (before change or removal)
      days_at_status = as.numeric(difftime(lead(CHG_DT), CHG_DT, units = "day")),
      seq = min_rank(CHG_DT),           # sequence of status changes
      final = 1L*(lead(CHG_TY) == "D"), # indicator of final status change
    ) %>% 
  ungroup() %>% 
  #: Remove observation related to waitlist removals
  filter(CHG_TY != "D") %>% 
  arrange(WL_ID_CODE, WL_DT)

Join the PTR (offer) data to the waitlist data

  • Only consider candidates < 18 (at time of offer).
  • Adds time of next offer the candidate received for each waitlist listing
  • If the MATCH_RUN_DT is after the next status change, then the offer doesn’t occur in the given waitlist-status time (i.e., censoring)

Outcome Variables:

  • wait_days is the number of (fractional) days from waitlist change to next offer
  • event is the binary indicator that an offer occurred before the next waitlist status change. If event = 1 then an offer was given to candidate before the next status change and event = 0 if the status change occurred first (i.e., censoring).
  • time is the elapsed time from status change to either offer or another status change. This is the main variable for modeling time to event.
Code
#: Join the PTR (offer) data to the waitlist data
WL_PTR = inner_join(
  x = WL_STATUS, 
  y = PTR_all %>% filter(AGE_PT < 18) %>% # only keep candidates < 18
    select(WL_ID_CODE, LISTING_CTR_CODE, MATCH_SUBMIT_DT, AGE = AGE_PT, PTR_ID),
  by = join_by("WL_ID_CODE", closest(WL_DT <= MATCH_SUBMIT_DT))
  ) %>% 
  mutate(
    wait_days = as.numeric(difftime(MATCH_SUBMIT_DT, WL_DT, units = "day")),
    event = ifelse(days_at_status >= wait_days | is.na(days_at_status), 1L, 0L), 
    time = pmin(wait_days, days_at_status, na.rm=TRUE)
  )

Make data to model time until offer

We only keep one record per candidate/waitlist so we don’t overrepresent the candidates that had many refusals and offers. While there are few options, we decided to keep the time to event for the candidate’s first active waitlist. Some candidates are initially listed as temporarily inactive.

Predictor Variables:

  • day is the number of (fractional) days since 2010-01-01. Used to account for temporal trends in time to offer.
  • ABO is blood type
  • WEIGHT_KG is candidate’s weight (at time of listing)
  • AGE is the candidate’s AGE (at time of offer)
  • STATUS is the candidate’s waitlist status (1A, 1B, 2) at time of offer.
Code
#: get relevant data from thoracic
thor = 
  thoracic %>% 
  transmute(
    WL_ID_CODE, 
    WEIGHT_KG = coalesce(INIT_WGT_KG_CALC, WGT_KG_TCR), 
    ABO = recode(ABO, A1 = "A", A2 = "A", A1B = "AB", A2B = "AB", 
         .missing = "Unknown")
    )

#: make data for modeling
data_cox = 
  WL_PTR %>% 
  #: get first *active* status for each candidate/waitlist
  filter(STATUS != "Temporarily Inactive") %>% 
  slice_min(WL_DT, by = WL_ID_CODE, with_ties = FALSE) %>% 
  #: date ranges
  filter(
    as.Date(WL_DT) >= as.Date("2010-01-01"), 
    as.Date(WL_DT) <= as.Date("2020-12-31"), 
  ) %>% 
  #: Add predictor variables
  left_join(thor, by = "WL_ID_CODE") %>% # add thoracic data
  #: Select relevant data for modeling 
  transmute(
    # waitlist info
    WL_ID_CODE,
    WL_DT,
    # outcomes
    time,
    event,
    # predictors
    WEIGHT_KG,
    LISTING_CTR_CODE = factor(LISTING_CTR_CODE), 
    STATUS = factor(STATUS), 
    AGE,
    day = as.numeric(difftime(WL_DT, as.Date("2010-01-01"), units = "day"))-1,
    ABO = factor(ABO)
  )

summary(data_cox)
   WL_ID_CODE          WL_DT                             time              event       
 Min.   :   5088   Min.   :2010-01-04 17:20:02.50   Min.   :  0.0004   Min.   :0.0000  
 1st Qu.:1082778   1st Qu.:2013-03-04 16:34:08.85   1st Qu.:  3.1416   1st Qu.:1.0000  
 Median :1245614   Median :2015-11-10 17:16:40.97   Median :  9.4437   Median :1.0000  
 Mean   :1243462   Mean   :2015-10-08 22:02:05.31   Mean   : 20.6929   Mean   :0.8073  
 3rd Qu.:1407406   3rd Qu.:2018-06-27 16:50:42.75   3rd Qu.: 24.5026   3rd Qu.:1.0000  
 Max.   :1572456   Max.   :2020-12-31 20:38:07.99   Max.   :715.2160   Max.   :1.0000  
                                                                                       
   WEIGHT_KG       LISTING_CTR_CODE       STATUS          AGE          
 Min.   :  2.000   19437  : 298     Status 1A:3927   Min.   : 0.01046  
 1st Qu.:  6.225   00124  : 278     Status 1B:1071   1st Qu.: 0.57462  
 Median : 14.800   00248  : 258     Status 2 :1089   Median : 4.32979  
 Mean   : 25.731   19468  : 253                      Mean   : 6.59012  
 3rd Qu.: 41.200   13795  : 241                      3rd Qu.:12.90929  
 Max.   :177.500   00403  : 236                      Max.   :17.99826  
 NA's   :1         (Other):4523                                        
      day           ABO      
 Min.   :   2.722   A :2192  
 1st Qu.:1157.690   AB: 237  
 Median :2138.720   B : 807  
 Mean   :2105.918   O :2851  
 3rd Qu.:3098.702            
 Max.   :4016.860            
                             

Fit Cox-PH Model

Predicting time (time until offer) with censoring indicated by event. Using smooth terms (i.e., thin-plate regression splines) for WEIGHT_KG, AGE, and day. Using random effect terms for ABO and LISTING_CTR_STATUS. Including STATUS unpenalized. Parameters estimated using REML.

Code
library(mgcv)
cox_gam = 
  mgcv::gam(time ~ STATUS + 
                    s(WEIGHT_KG) + 
                    s(AGE) + 
                    s(day) + 
                    s(ABO, bs = "re") +                 
                    s(LISTING_CTR_CODE, bs = "re"),
            weights = event,          # censoring indicator
            family = mgcv::cox.ph(),  # Cox PH model
            method = 'REML',
            data = data_cox)
summary(cox_gam)

Family: Cox PH 
Link function: identity 

Formula:
time ~ STATUS + s(WEIGHT_KG) + s(AGE) + s(day) + s(ABO, bs = "re") + 
    s(LISTING_CTR_CODE, bs = "re")

Parametric coefficients:
                Estimate Std. Error z value Pr(>|z|)    
STATUSStatus 1B -0.42857    0.04407  -9.724   <2e-16 ***
STATUSStatus 2  -1.37097    0.04953 -27.678   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                       edf Ref.df Chi.sq p-value    
s(WEIGHT_KG)         8.683  8.967 291.91  <2e-16 ***
s(AGE)               8.526  8.936 154.95  <2e-16 ***
s(day)               6.729  7.840  41.96  <2e-16 ***
s(ABO)               2.937  3.000 360.34  <2e-16 ***
s(LISTING_CTR_CODE) 50.261 91.000 427.34  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Deviance explained = 6.66%
-REML =  36992  Scale est. = 1         n = 6086

Get Listing Center Effects

Note: positive coefficients represent fast time to offer. These are centers that have shorter time to offer than other centers (after controlling for age, weight, blood type, status, and year). The centers’ with negative coefficients have longer waiting until offer.

Code
library(gratia)
LC = smooth_estimates(cox_gam, select = "s(LISTING_CTR_CODE)") %>% 
  select(
    LISTING_CTR_CODE, LC_effect = .estimate
  ) %>% 
  arrange(LC_effect)

Density estimation of LC effects:

Code
LC %>% ggplot(aes(LC_effect)) + geom_density() + geom_rug()

Save results

  • LC_effect: the coefficient for listing center in the cox-ph regression. Higher (positive) value means shorter wait time. Smaller (negative) value means longer wait time.
  • total: total number of candidates listed at center
  • Status {1A, 1B, 2}: number of candidates at each status
Code
(
LC_effects = 
    data_cox %>% count(LISTING_CTR_CODE, STATUS) %>% 
    spread(STATUS, n, fill = 0L) %>% 
    mutate(total = `Status 1A` + `Status 1B` + `Status 2`) %>% 
    left_join(
      LC, 
      by = "LISTING_CTR_CODE"
    ) %>% 
    relocate(LC_effect, .after=1)
)
Code
LC_effects %>% write_csv("data/LC_effects.csv")

Median Wait Time

Estimates the median time from listing to first offer per center. Note that this ignores status changes; it only considers time from initial waitlisting to first offer. Many candidates change status before first offer, but this doesn’t account for it.

This is different than how the Listing Center effects are estimated in the above section which limits analysis to first status period.

Join waitlist data to PTR

Code
#: Get waitlist status and waitlist organ at time of match run
WL_STATUS = 
  PTR_all %>% select(PTR_ID, WL_ID_CODE, MATCH_SUBMIT_DT) %>%
  left_join(
    WL_hist, 
    join_by(WL_ID_CODE, closest(MATCH_SUBMIT_DT > CHG_DT ))
  ) %>% 
  select(PTR_ID, ORG_CAND = ORG, STATUS_AT_OFFER = STATUS)


#: get date candidate added to waitlist
#   This should match thoracic$INIT_DATE and INIT_STAT
#   there are only a handful of cases where INIT_DATE < WL_DATE (mostly  < 2004)
cand_listed_date = 
  WL_hist %>%
  filter(CHG_TY == "A") %>%          # date activated on waitlist
  distinct() %>%
  transmute(
    WL_ID_CODE, 
    WL_DATE = as.Date(CHG_DT), 
    STATUS_AT_LISTING = STATUS
  )

#: add data to PTR 
PTR = PTR_all %>% 
  # add PT_CODE, ORG_CAND, and STATUS_CAND to PTR
  left_join(WL_STATUS, by="PTR_ID") %>%  
  # add data candidate placed on waitlist
  left_join(cand_listed_date, by = "WL_ID_CODE")  
  # left_join(thoracic %>% select(WL_ID_CODE, STATUS_AT_LISTING = INIT_STAT, WL_DATE = INIT_DATE), 
  #      by = "WL_ID_CODE")

#----------------------------------------------------------------------------#
## Make offer data ----
#  NOTES: 
#   - ONLY offers from DONORS < 30 years old are included! 
#   - Candidates < 18; on HR waitlist (not HL)
#   - Candidates added to waitlist between 2010 and 2019
#   - Our offer data is from 2010-march 2021, so there will be some
#     candidates still on waitlist at march 2021. 
#----------------------------------------------------------------------------#

offers = PTR %>% 
  # add year and if offered donor was eventually utilized
  mutate(
    MATCH_RUN_YEAR = format(MATCH_SUBMIT_DT, "%Y"),
    donor_utilized = ifelse(is.na(MAX_ACCEPT_SEQ), 0, 1)
  ) %>% 
  # get population of interest
  filter(
    AGE_PT < 18,        # Pediatric Candidate
    ORG_CAND == "HR",   # Candidate only on HR waitlist (not HL)
    WL_DATE >= as.Date("2010-01-01"), # listed after 2010 (so we have offer data)
    WL_DATE <= as.Date("2019-12-31"), # listed before 2020
  ) %>% 
  select(
    PTR_ID, WL_ID_CODE, LISTING_CTR_CODE, DONOR_ID,
    AGE_PT, ORG_CAND, STATUS_AT_OFFER, STATUS_AT_LISTING,
    WL_DATE,
    MATCH_SUBMIT_DT, MATCH_RUN_YEAR,
    INITIAL_RESPONSE_DT, 
    OFFER_ACCEPT, PTR_SEQUENCE_NUM,
    donor_utilized
  )

offers

Get waiting time until first offer

First offer time:

Code
#: calculate time to first offer
#  Notes:
#   - Using days between waitlist registration and match run.
#     Could use time at initial response, which may change by a day or two
#     but there are also some large outliers. 

first_offer = offers %>% 
  slice_min(MATCH_SUBMIT_DT, by = WL_ID_CODE, with_ties = FALSE) %>% 
  transmute(
    LISTING_CTR_CODE, 
    WL_ID_CODE, 
    STATUS_AT_LISTING, 
    AGE_PT, 
    PTR_SEQUENCE_NUM,
    WL_DATE,
    YR = format(WL_DATE, "%Y"),
    wait_days = as.numeric(as.Date(MATCH_SUBMIT_DT) - WL_DATE)
  )

Waiting time until first offer:

Code
#: summarize first offer waiting time by listing center
waiting_time = left_join(
  first_offer %>% 
    group_by(LISTING_CTR_CODE) %>% 
      summarize(
        n_total = n(),
        median_wait_days = median(wait_days)
      ) %>% 
    ungroup(), 
  
  first_offer %>% 
    group_by(LISTING_CTR_CODE, STATUS_AT_LISTING) %>% 
      summarize(
        n = n(),
        median_wait_days = median(wait_days),
        .groups = "drop_last"
      ) %>% 
      mutate(p = n/sum(n)) %>% 
    ungroup() %>% 
    complete(LISTING_CTR_CODE, STATUS_AT_LISTING, 
             fill = list(n = 0L, p = 0)) %>% 
    mutate(STATUS_AT_LISTING = str_remove(STATUS_AT_LISTING, "Status ")) %>% 
    pivot_wider(
      names_from = c(STATUS_AT_LISTING), 
      values_from = c(n, p, median_wait_days), 
    ),
  by = "LISTING_CTR_CODE"
)

Save results

  • n_total: total number of candidates at center
  • n_{1A, 1B, 2}: number of candidates at each status
  • p_{1A, 1B, 2}: proportion of candidates at each status
  • median_wait_days: overall median number of days until first offer
  • median_wait_days_{1A, 1B, 2}: median number of days until first offer (for status)
Code
waiting_time 
Code
waiting_time %>% write_csv("data/waiting_time.csv")

Refusal statistics

We have, for each waitlist, the number of times a donor was refused. However, there can be censoring due to the candidate being removed from the waitlist before an offer is accepted.

The approach is to model the number of refusals before acceptance in a survival analysis setting. Specifically, we are using a Kaplan-Meier estimator (by LISTING_CTR_CODE) to estimate the median number of refusals per candidate.

Notes:

  • We use a pseudo-bayesian approach to help estimatation for centers with few candidates. We take the aggregate known (uncensored) n_refusals and treat as additional data before fitting a KM.

  • We are not controlling/adjusting for candidate status, size, etc. We could by moving to a cox-ph or e.g., only using status 1A, etc.

Offer and Refusal data for each waitlist (in population)

Note: using offers dataframe from above section.

The n_refusals is the outcome and accept is the event/censoring indicator.

Code
refusals_WL = 
  offers %>% 
  group_by(WL_ID_CODE, LISTING_CTR_CODE, STATUS_AT_LISTING) %>% 
    summarize(
      n_offers = n_distinct(DONOR_ID),
      accept = any(OFFER_ACCEPT == "Y"),
      n_refusals = n_offers - accept,
      .groups = "drop"
    )
refusals_WL

Set-up “prior” data

Code
#: number of pseudo observations
k = 3

#: create pseudo observations; use frequency weighting w
prior = 
  refusals_WL %>% 
  filter(accept) %>% 
  count(n_refusals) %>% 
  mutate(w = k*n/sum(n), accept=TRUE)

#: replicate for all LISTING_CTR_CODES
LC = refusals_WL$LISTING_CTR_CODE %>% unique
prior_df = 
  map(LC,\(x) prior %>% mutate(LISTING_CTR_CODE = x)) %>% 
  bind_rows()

#: Data for modeling
data_KM = 
  bind_rows(
    refusals_WL %>% mutate(w=1), # set weights = 1 for original data
    prior_df                     # add on the pseudo observations
  )

Fit the KM model

Use fitted KM to estimate median and mean number of refusals before acceptance:

Code
library(survival)
KM_fit = survfit(Surv(n_refusals, accept) ~ LISTING_CTR_CODE, 
                 data = data_KM, 
                 weight = w) 

refusals_KM = 
  summary(KM_fit)$table %>% 
  as_tibble(rownames = "LISTING_CTR_CODE") %>% 
  transmute(
    LISTING_CTR_CODE = str_remove(LISTING_CTR_CODE, "LISTING_CTR_CODE="),
    # records, 
    # n = n.max - k, 
    # events = events - k, 
    refusals_mean = rmean, 
    refusals_median = median
  )
refusals_KM

Save Data

  • n_candidates: number of candidates listed at center
  • median_refusals: median number of offer refusals per candidate
  • median_refusals_old: the orginal, biased, estimation that didn’t account for censoring or small sample sizes
  • mean_refusals: mean number of offer refusals per candidate
  • n_{offers, refusals}: number of offers/refusals at center
Code
refusals = refusals_WL %>% 
  group_by(LISTING_CTR_CODE) %>% 
    summarize(
      n_candidates = n_distinct(WL_ID_CODE),
      median_refusals_old = median(n_refusals),
      n_offers = sum(n_offers),
      n_refusals = sum(n_refusals),
      p_refusals = n_refusals / n_offers,
      # n_tx = sum(accept),
      # p_tx = mean(accept)
    ) %>% 
  ungroup() %>% 
  left_join(
    refusals_KM %>% 
      rename(
        median_refusals = refusals_median, 
        mean_refusals = refusals_mean
      ), 
    by = "LISTING_CTR_CODE"
  )

refusals 
Code
refusals %>% write_csv("data/refusals.csv")